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' There has been much recent interest in modelhng epidemics on networks, particu- 

larly in the presence of substantial clustering. Here, we develop pairwise methods to 
answer questions that are often addressed using epidemic models, in particular: on the 
basis of potential observations early in an outbreak, what can be predicted about the 
epidemic outcomes and the levels of intervention necessary to control the epidemic? 
O . We find that while some results are independent of the level of clustering (early growth 

O ' predicts the level of 'leaky' vaccine needed for control and peak time, while the basic 

reproductive ratio predicts the random vaccination threshold) the relationship between 
other quantities is very sensitive to clustering. 
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Abstract 



> : Introduction 

Q>^ , There has been much recent interest in modelling infectious diseases on contact net- 

works \15\ S] . The incorporation of population structure that deviates from homogeneous 
^SJ ■ mixing is most generally conceptualised as a network of epidemiologically relevant contexts, 

and an increasing amount of data is available on these contacts, either from surveys [19\ or 
socio-demographic data [9j. One observed feature of realistic contact networks is the pres- 
ence of an appreciable number of short closed loops in the network, often called clustering. 
Through construction of networks with special structure, it has recently been shown 
^ I possible to derive some exact results for clustered networks based on households O [Sj and 

' non-overlapping triangles |20[ I18| . It has also been possible to write down and integrate 

dynamical systems that capture transient epidemic behaviour on such networks |24| [TT] . 

At the same time, a more established approach to epidemics on clustered networks exists 
in the form of pairwise equations, where an approximation is made to produce a system 
of ordinary differential equations (ODEs) that depend on a small number of real-valued 
parameters for the network [14]. These approximations have been consistently found to 
be in good qualitative agreement with stochastic simulations |13| . and the lack of rigorous 
justification for the closure used is compensated for in several ways. For example, a system 
of ODEs can easily be integrated, involves few parameters, and has a rigorous definition of 
critical intervention thresholds. There is also the benefit that analytic manipulations can 
be performed on the equations involved to aid theoretical understanding [23J. 

One of the key uses of epidemic modelling is to predict outcomes and critical interven- 
tion thresholds on the basis of observables, which can be done analytically for the special 
case of household models [10] . In this paper we consider the insights that can be gained 
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for epidemic dynamics on networks with a general clustered structure from consideration 
of pairwise equations. We start by reviewing standard epidemic and network theory. We 
then outline our pairwise methodology, including the definition of observable reproduction 
numbers and control thresholds. We then consider the implication of this for intervention 
and outcome prediction. 

The SIR model 

One of the simplest approaches to the problem of epidemic prediction is the SIR model, 
developed over 80 years ago |16) . This basic paradigm has been extended and applied 
fruitfully to a wide range of diseases [1]. In the absence of births and deaths, the dynamics 
for this model can be written in terms of the proportions of the population susceptible, 
S{t), infectious, I{t), and removed, R(t), as 

S{t) = -f3Sit)I{t) , 

i{t)=(3S{t)I{t)-jI{t) , (1) 

m = im . 

This model has two parameters: 7, the rate of recovery from the infectious state; and 
^, the rate at which new cases are created when susceptible and infectious individuals 
interact. An important quantity that emerges from analysis of many epidemic models is 
the basic reproductive ratio, Rq, which we define (standardly) as the average number of 
secondary cases produced by an average infectious individual early on, once the epidemic 
dynamical system has settled onto its dominant eigenvector. In the simple SIR model given 
by equations [H this quantity as defined is equal to f3/j. 

Strictly speaking, since the system is non- linear, we should also consider /(O), the 
initial proportion of the population that is infectious, and S{0), the initial proportion of 
the population that is susceptible (since they are proportions, the dynamical variables obey 
S + I + R = 1 at all times). In this work, we take /(O) <C 1, and furthermore assume 
S{0) ~ 1. Given these assumptions, for SIR dynamics in a homogeneous population with 
arbitrary recovery time distribution, the final size of the epidemic. Rod, will be given by a 
solution to the transcendental equation below |16] : 

R^ = l- e-«oRoc (2) 

We can also write down exact expressions for the peak height and time in the simple SIR 
model, which are 

It is another relevant standard result that, if a small amount of infection /(O) is intro- 
duced into an otherwise susceptible population then this model predicts that at early 
times, the proportion of infectious individuals in the population will be given by I{t) ~ 
/(0)e^(«o-i)* [IJ. 

Network theory 

We consider a network of N nodes labelled to be defined by its adjacency matrix 

Gij, which takes the value 1 when nodes i,j are connected and the value otherwise. We 
define this also to be symmetric and without self-edges, so Gij = Gji and Ga = 0. 
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Since we are considering the impact of clustering, we try to remove the impact of node 
degree heterogeneity, which can confound the effects of clustering |17) . by assuming that 
every individual has the same neighbourhood size n 



yi,^Gij = n. (4) 



It is worth noting that there is no reason why our methodology should not be extended 
to heterogeneous degree distributions; our choice of neighbourhood regularity is simply to 
consider the impact of clustering independently of other network statistics. The clustering 
coefficient (p is defined by 

e[0,l], (5) 



where 6ik is the Kronecker delta. If the dynamical state of a node i is indicated by Ai, 
then we also use a notation where: 

[A] = Y,Ai, [AB] = AB,G,j , [ABC] = AB.CkGijG.k ■ (6) 

i i,j i,j,k 

Methods 

Pair wise equations 

The pairwise equations for an SIR epidemic are standardly given by 

[S] = -t[SI] , 
[I]=t[SI]-j[I] , 

[R] = 7[/] , 
[SS] = -2t[SSI] , 

[SI] = t{[SSI] - [ISI] - [SI]) - j[SI] , 

[//] = 2t{[ISI] + [SI]) - 27[//] , 
[SR] = -t[ISR]+j[SI] , 

[IR] = t[ISR]+^{[II]-[IR]) , 
[RR] = -/[IR] . 



These equations are exact but unclosed. While we restrict ourselves to the simple SIR 
model, it is possible to include more complex disease natural histories such as the SEIR 
model presented in |2lj . To close this system, we approximate the triples using the standard 
approximation for clustered systems pi], 

n-l\ABV,BC]/ N\PA^\ 



While there are several ways to recover the standard SIR model from network systems, for 
the pairwise approach the most natural limit is to take n — ?> oo while holding /3 = nr con- 
stant. We therefore compare the scaled transmission rate {nr/'y) to other quantities, since 



3 



this converges to all other reproduction numbers in the appropriate limit. Another quan- 
tity that converges to other reproduction numbers in the homogeneous limit is nT/(r + 7), 
which is Rq for an unclustered network. Since this is a closed function of the scaled trans- 
mission rate, we do not present additional results, but note that it is a quantity that may 
well be estimated or inferred from data. The system of equations ([7]) can be numerically 
integrated using standard methods such as Runge-Kutta, although our experience is that 
sophisticated solvers capable of switching to stiff methods are significantly more accurate 
and numerically efficient. 

The closure ([8]) can be used for any graph where the parameters N, n and (p are 
known. This closure is intended to be applied to graphs where each node has degree close 
to n, although closures appropriate to more heterogeneous systems do exist |13) . Since the 
first application of this closure to epidemic systems, attempts have been made to provide 
rigorous justification of their validity [21\ . While there are no exact results at present, 
from previous work we expect that the agreement is likely to be best for regular (degree- 
homogeneous) configuration-model networks where a small amount of clustering has been 
introduced through rewiring |13| , and that agreement will be poor in the presence of degree 
heterogeneity |8l 0] , where shortest path lengths are long |i22j or when the network motif 
structure is not well captured by the single parameter cf) |12| . 

Solution by linearising Ansatz 

One way to gain analytic traction on equations ([7]) is through linearisation of the system, 
representing the situation early in the epidemic when the number of susceptibles has not 
been significantly depleted. A straightforward method is to define an Ansatz representing 
the intuition that all dynamical variables should have their behaviour determined by the 
prevalence of infection, and then to confirm that this Ansatz is indeed a consistent solution. 

We start by defining the early growth reproduction number tq through early asymptotic 
growth in the proportion of infectious individuals. 

I{t) =: /starte^^^'"-')* , (9) 

where /start is the proportion of infectious individuals at the start of the period of expo- 
nential growth, ro can therefore be measured by fitting an exponential curve to the early 
growth in the number of cases at the start of an epidemic, e.g. |;5j. Where /start ^ we 
propose linearisation of the system of epidemic equations using the following Ansatz: 

i{t) = 7(ro - l)/(t) , 

[A] = [A]o + kAl{t) , (10) 
[AB] = [AB]o + kABl{t) . 

Putting these substitutions into the system ([7]) closed by ^ and solving algebraically for 
{ro, fc^, fc^s} at given T,'y,n and cp allows us to parameterise the dynamical system. Al- 
gorithmically, this offers significant advantages over, say, an iterative scheme that involves 
repeated integration and modification of underlying parameters to match an early growth 
curve. 

Definition of Rq 

The primary difficulty of defining a 'true' basic reproductive ratio that is both a threshold 
and corresponds to the verbal definition in a clustered population is that, even early in the 
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epidemic, infectious individuals share susceptible contacts and this competing risk does 
not allow the argumentation used for locally tree-like networks or well-mixed populations. 

In |23) . reproduction numbers were defined analytically for clustered systems based on 
moment closure. We consider a different, complementary method that can be applied to 
any epidemic model (including stochastic and individual-based simulations) based on gen- 
eration counting, but which is numerical rather than analytic. Figure [T] shows the method 
schematically. Firstly, the system is run until the early network correlation structure has 
equilibrated and the dynamical system sits on its dominant eigenvalue. We then relabel 
all infecteds as 0-th generation Iq, and label two subsequent generations of infection Ii,l2, 
which recover to i?2 respectively, before letting the epidemic proceed until no infectious 
individuals of type Ii and I2 remain. The basic reproduction number Rq is given by the 
final value of R2/R1. To demonstrate this technique applied to the standard SIR model, 
consider the linearised equations 

Jo = -7^0 , ii = Ph - ih , 12 = - ih ■ (11) 
The solution to this system is 

Jo = /(0)e-^* , h = imte-^' , h = \l{mtfe-^' , (12) 

which reproduces the standard Rq = (3/^ when the ratio of the area under the I2 and Ii 
curves is evaluated in the limit t — )• 00. 

There are, of course, some epidemic models such as simulations based on lattices or 
highly complex individual behaviour where there is not an obvious dynamical system under- 
lying the model, and early behaviour of the epidemic does not involve exponential growth. 
In these systems, the method of generation counting will still give an answer that closely 
matches the standard definition oi Rq |7j, but with spatial structure playing a comparable 
role to risk structure. For example, if a disease is sufficiently transmissible to invade a 
square lattice then the quantity R2/R1 will asymptote to unity, as would be expected of 
Rq. Furthermore, if there is a phase in a system's early dynamics (before the proportion of 
susceptibles in the whole population has been significantly reduced) where the ratio R2 / Ri 
reaches quasi-equilibrium, then this constant quantity will provide a threshold for epidemic 
invasion. 

Vaccination 

The parameters nr/7, tq and Rq as defined above are observable early in an epidemic, but 
do not directly lead to critical intervention thresholds for network epidemic models in the 
same way as in the standard SIR model. 

In this paper, we consider two distinct interventions: reducing transmission by a pro- 
portion e so r — )• (1 — e)r, and placing a proportion of the population v, chosen randomly 
at the individual level, in the recovered class at the start of the epidemic, so that 

[S]q = il-v), [SS]q = (1 - vf . (13) 

We preserve the degree distribution of the network by placing nodes in the dynamically 
inert recovered class, so that e.g. [R]q = v; this is in contrast to modelling vaccination by 
modification of the network topology, and allows ([8]) to remain valid. The critical values 
sufficient to contain an epidemic, Ec and Vc can be calculated by the use of linearising 
Ansatz as above, then finding values at which the predicted ro(u,e) is 1. So that we are 
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comparing similar quantities, we use these critical values to define 'leaky' and 'vaccination' 
reproduction numbers 

Rl = , Rv = -r-^ ■ (14) 

This terminology is taken from household models, where analytic results can be obtained 
relating different reproduction numbers [10) . 



Results 
Analytic results 

We start by considering some analytic results that can be obtained from the pairwise 
system. Our methods are developed with numerical solution in mind, however some closed 
expressions can be derived by substituting ([8]) and (llOp into ([T]). At small clustering in a 
regular graph with n links per node we can calculate the first-order impact of clustering 
on ro, 

r(i 2(n - 1) (2(n - l)(n - 2)t + n7) , , ^ r,.-, 

which demonstrates the standard result that, leaving other parameters constant, clustering 
reduces epidemic potential. Unclosed expressions to all orders in (j) can be found in \2?>\ 
Eqns. (14, 19) with r* — >• ro]. In structured populations, there is a conceptual difference 
between an intervention that reduces transmission by a fraction e and a random, completely 
protective, vaccination of a proportion v of the population. In the absence of clustering, 
this is reflected in the difference between leaky and vaccinated early growth reproduction 
numbers as below 

ro(e) = (l-e)-(n-2) , r^iv) = -{{n - - v) - I) . (16) 

7 7 

Interestingly, where </> = 0, this means that Ri = ro, and also 

Rv = -^ = {n-l)^— = RQ . (17) 

1 — V T + 7 

In the clustered case these relationships between observables and intervention-related thresh- 
olds do not hold exactly, however we will show that they can remain numerically close. 



Reproduction numbers 

We now consider numerical integration of the pairwise system ([7]) . This requires parameters 
to be chosen, and so we consider a network with small neighbourhood size n = 4 and vary 
clustering from to 0.3. We consider two sets of results: in Figure [21 we consider the 
relationship between different observable and intervention reproduction numbers; and in 
Figure [3l we consider the relationship between different observables and epidemic outcomes. 

Looking at Figure [21 we see a consistent ordering (nr/7) > Rl ^ > Ry > Rq , as 
would be expected from exact results for household models |10) . We also find, however, 
that even when they do not agree, the early growth observable ro is strongly predictive 
of the leaky vaccination threshold Rl, while the basic reproductive ratio Rq is strongly 
predictive of the random, effective vaccination threshold Ry- This is the case despite the 
fact that other pairs of reproduction numbers (e.g. ro and i?o) can differ very significantly 
at different levels of clustering. 
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Outcomes 

We now consider the predictive power of different quantities that are observable early in the 
epidemic: the transmission rate, early growth rate and basic reproduction number. Our 
selection of outcomes is the final size (also called attack rate), which determines the overall 
proportion of the population that has suffered from the disease, the peak prevalence, which 
is important for assessing the maximum burden of clinical disease during an epidemic, and 
the time to peak, which determines how much time is available to prepare for the peak 
burden. 

Looking at Figure [3l we see several features. At a constant transmission rate (top row) 
epidemic peaks are lower and occur later as clustering is increased. At low transmission 
rates, clustering decreases attack rate, but at higher transmission rates, clustering increases 
attack rate. This latter, counter-intuitive effect is seen exact results for special clustered 
network types but due to the extremely high attack rates involved, this effect is 

not easy to reproduce in simulation. It is possible that this effect may be much larger 
for different disease natural histories, and so it may be of practical as well as theoretical 
interest. 

At constant tq (middle row) epidemics in clustered populations have larger attack 
rates and peaks. While peak times are also slightly reduced, the early growth rate is 
strongly predictive of peak time. This predictive power can be understood by considering 
the implications of exponential early growth. When an epidemic peaks, this is due to 
depletion of susceptibles below the level required for continued transmission of infection. 
Exponential growth, by its nature, places strong bounds on the range of times for which 
susceptible depletion can be appreciable and hence the rate of such growth is strongly 
predictive of peak time. It is worth noting that the absolute times to peak will be strongly 
dependent on the small initial number infectious, which we take as /(O) = 10~^. Where 
ro is held constant as 1(0) is varied, different peak time curves will all be shifted right or 
left by the same amount, since each epidemic experiences the same early growth rate. For 
other rows of Figure [3l the shift will need to be determined from the value of ro at a given 
Rq or nr/7 as shown in Figure [51 

Similarly to the case of rg, at constant Rq (bottom row) the consequences of intro- 
ducing clustering are uniformly larger epidemics, with larger, earlier peaks. This reversal 
of outcome prediction for ro,i?o a-s compared to transmission rate can be understood in 
the following way. Clustering frustrates the epidemic process early on in an outbreak, as 
infectious individuals compete locally for shared susceptibles. Compared to a locally tree- 
like network, however, a clustered network offers more routes for infection to travel from 
one node to another and so its effect on final outbreak size is non-trivial to predict. Once 
we have adjusted the underlying transmission rate upwards compared to an unclustered 
model to give similar ro or Rq, the outbreak will definitely have a larger size than in the 
unclustered scenario. 

Discussion 

We have argued in this paper for the merits of a pairwise approach to understanding 
epidemics in clustered populations that complements simulations and exact results for 
special network structures. Pairwise models allow simply interpretable conclusions to be 
drawn with little numerical effort, and involve a small number of real parameters. 

Our results show that for some modelling tasks, clustering does not need to be con- 
sidered to get accurate results. In particular, early growth rate predicts the critical leaky 
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vaccine level and peak time; and Rq predicts the critical vaccination threshold. For other 
important calculations, however, the presence of significant clustering in a population must 
be modelled to give an accurate prediction. 

The problem of epidemic prediction and model misspecification is, of course, also posed 
for many other extensions of the standard SIR model. What appears to be unique about 
clustering is that it predicts smaller epidemics at constant transmission rate and larger epi- 
demics at constant early growth rate as clustering is increased, in contrast to other forms of 
population structure [6j, where heterogeneity leads to larger epidemics at constant trans- 
mission rate and smaller epidemics at constant Rq. Another way of looking at our results 
is, therefore, that if an epidemic has an attack rate lower than would be expected from 
the SIR model, and the population is clustered, then standard forms of heterogeneity must 
be even more significant than an unclustered model would predict — although a mathe- 
matical model of the interaction between clustering and heterogeneity (partially addressed 
by [MllE]) would help to make this intuition clearer. 
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(5) li individuals recover to Ri, while I2 indi- (6) I2 individuals recover to R2, while I indi- 
viduals generate unlabeled infectious individ- viduals recover to R, and the epidemic con- 
uals I, which generate other I individuals tinues until the end 



Figure 1: Calculating the actual basic reproduction number through generation counting. 
Susceptible individuals are not labelled for clarity. The basic reproductive number Rq is 
given by the value of R2/R1 at the end of the epidemic. 
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Figure 2: Observables and reproduction numbers for a clustered network with n = 4. The 
scaled transmission rate nr/7, early growth tq, full Rq, vaccination Ry and leaky Rl are 
all plotted against each other, showing a diversity of relationships. 
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Figure 3: Outcomes for different observables for a clustered network with n = 4. The 
outcomes of final size, peak height and time are shown in columns, while rows correspond 
to constant scaled transmission rate nr/7, early growth vq and full Rq. 
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